Fix: strip diacritics in step 3 CSV filter by jakebromberg · Pull Request #2 · WXYC/discogs-cache

jakebromberg · 2026-02-12T03:48:40Z

Summary

The library stores ASCII artist names ("Bjork") but Discogs uses diacritics ("Björk")
Step 3's normalize_artist() only did .lower().strip(), so "björk" != "bjork" and all releases for those artists were silently excluded from the cache
Now uses unicodedata.normalize('NFKD') to strip diacritics before comparing, matching the approach already used in step 8's verify_cache.py

Test plan

Existing TestNormalizeArtist cases still pass
New parametrized cases for Bjork, Sigur Ros, Motorhead, Husker Du, Cafe Tacvba, Zoe
Full test_filter_csv.py suite passes (28 tests)
Next pipeline run should pick up previously-missed diacritics artists

The library stores ASCII names ("Bjork") but Discogs uses diacritics ("Björk"). The step 3 filter compared with .lower().strip() only, so all releases for diacritics artists were silently excluded from the cache.

jakebromberg merged commit 27cdc42 into main Feb 12, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: strip diacritics in step 3 CSV filter#2

Fix: strip diacritics in step 3 CSV filter#2
jakebromberg merged 1 commit intomainfrom
fix/filter-diacritics

jakebromberg commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jakebromberg commented Feb 12, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant